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ABSTRACT 

Unified Human Interactome (UniHI) (http://www.unihi. 
org) is a database for retrieval, analysis and visual- 
ization of human molecular interaction networks. 
Its primary aim is to provide a comprehensive and 
easy-to-use platform for network-based investiga- 
tions to a wide community of researchers in biology 
and medicine. Here, we describe a major update 
(version 7) of the database previously featured in 
NAR Database Issue. UniHI 7 currently includes 
almost 350000 molecular interactions between 
genes, proteins and drugs, as well as numerous 
other types of data such as gene expression and 
functional annotation. Multiple options for interactive 
filtering and highlighting of proteins can be employed 
to obtain more reliable and specific network struc- 
tures. Expression and other genomic data can be 
uploaded by the user to examine local network struc- 
tures. Additional built-in tools enable ready identifi- 
cation of known drug targets, as well as of biological 
processes, phenotypes and pathways enriched with 
network proteins. A distinctive feature of UniHI 7 is 
its user-friendly interface designed to be utilized in 
an intuitive manner, enabling researchers less ac- 
quainted with network analysis to perform state- 
of-the-art network-based investigations. 

INTRODUCTION 

The study of molecular systems and networks is now 
a major field in biology and medicine. The goals of 
network-based investigations range from prioritization 
of candidate genes to determination of complex molecular 
mechanisms underlying a disease or a biological process 
(1,2). An essential prerequisite for these investigations is 
the availability of resources for molecular interactions in 



model organisms and humans. To address this need, 
various databases have been established in recent years 
(3). Especially for protein-protein interactions in 
humans, many initiatives and research groups have 
contributed large sets of data derived from the literature, 
high-throughput methods or computational prediction 
(4—15). In parallel, a wide range of dedicated programs 
for network analyses have also been developed (16-18). 

However, it is a common experience that current re- 
sources and tools pose a considerable challenge to users, 
especially to researchers less acquainted with concepts 
in network biology. Frequently, users have to download, 
map, compile and integrate distinct data types to conduct 
network-based investigations. These activities require ex- 
tensive knowledge of data processing and management. 
Thus, a salient 'bottleneck' exists for many interested 
researchers between the wealth of available molecular 
interaction data and their utilization. 

This observation motivated us to develop a new version 
of the Unified Human Interactome (UniHI) database for 
the retrieval, analysis and visualization of human molecu- 
lar interaction networks: UniHI 7. We provide a platform 
that enables (i) retrieval of an integrated set of interactions 
from the major resources, (ii) intuitive use of tools for 
network-based investigations and (iii) easy utilization of 
complementary data and information for analysis, evalu- 
ation and visualization of retrieved networks. 

SCOPE OF UNIHI 7 

UniHI 7 integrates ~350000 molecular interactions for 
more than 30000 human proteins. It is based on a 
complete re-implementation of previous versions of 
UniHI, with widely extended scope and functionality. 
Besides protein-protein interactions from 12 different re- 
sources [including HPRD (4), BioGrid (5), IntAct (6), DIP 
(7), BIND (8) and Reactome (9) databases; as well as four 
interaction maps produced by computational predictions 
(10-13) and two high- throughput yeast-2-hybrid (Y2H) 
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screens (14,15)], UniHI 7 also comprises curated transcrip- 
tional regulatory interactions from three complementary 
databases TRANSFAC (19), miRTarBase (20) and 
HTRIdb (21). In addition to these interactions, we also 
integrated drug target information from DrugBank (22) 
that can be mapped on the interaction network. Detailed 
description regarding the incorporated resources can 
be found on the UniHI 7 web-page and in the 
Supplementary Materials (Supplementary Table SI). 
Whereas former UniHI versions can primarily be 
regarded as integrated databases for human protein- 
protein interactions (23,24), additional strengths of 
UniHI 7 lie not only in the integration of regulatory inter- 
actions but also in its interactive analysis and visualization 
tools for molecular networks. Although there are other 
databases with integrated molecular interaction data 
(25-34), UniHI provides a distinct and unique set of fea- 
tures ranging from simple filtering options to advanced 
network analysis tools (Supplementary Materials and 
Supplementary Table S2). The main application of 
UniHI 7 is the retrieval and examination of small to 
medium-sized local networks. It is ideally suited for re- 
searchers, who want to explore the molecular context of 
a single protein or a select set of related proteins using a 
network-orientated approach. 

SEARCHING FOR INTERACTIONS IN UNIHI 7 

The UniHI 7 database can be queried for molecular inter- 
actions of single or multiple human proteins. Various 
identifiers such as gene symbol, Entrez Gene, Uniprot 
and Ensembl IDs can serve as input. It is also possible 
to input gene and protein identifiers from the model 
organisms yeast {Saccharomyces cerevisiae), worm 
(Caenorhabditis elgans), fly (Drosophila melanogastef) or 
mouse (Mus musculus), which will be automatically 
mapped to human orthologs. This feature is convenient 
for researchers, who work with these major model organ- 
isms and want to interrogate related human molecular 
networks. In total, 2977 yeast, 6922 worm, 7998 fly and 
1 5 694 mouse genes were mapped to human orthologs 
included in UniHI 7 using information from the HGNC 
database (35). As identifiers for model organisms, Entrez 
Gene IDs can be generally used, as well as systematic 
names for yeast, WormBase IDs or gene symbols for 
worm, FlyBase IDs or gene symbols for fly and MGI 
identifiers or gene symbols for mouse. The list of identi- 
fiers and any other data uploaded by the user are only 
stored during the active session and are accessible only 
to the user. 



PARALLEL PRESENTATION OF PROTEINS, 
INTERACTIONS AND NETWORK 

Whereas in previous UniHI versions, queried proteins, 
retrieved interactions and resulting networks were pre- 
sented in sequential order, UniHI 7 displays now 
on four web-pages in parallel: 'Proteins', 'Physical 
Interactions', 'Regulatory Interactions' and 'Network'. 
This display scheme enables users to readily switch 



between the different types of displayed information 
(Figure 1). The 'Proteins' page provides the list of 
proteins in the UniHI 7, matching the input and the 
names of the databases or resources in which each 
protein is included. In addition, hyperlinks to Entrez 
Gene, Uniprot, RefSeq, OMIM IDs and KEGG (if avail- 
able) are given. On the 'Physical Interactions' and 
'Regulatory Interactions' pages, the set of detected inter- 
actions is shown with various additional information re- 
garding their source, evidence, type and functional 
annotation. A crucial feature is that interactions can 
easily be traced back to the original resources and publi- 
cations, and thus can be critically assessed by users. In 
addition, all the interactions displayed on these two 
pages can be downloaded as simple tables, which can 
used as input for other computational tools. 

The retrieved interactions are displayed as a network on 
the 'Network' page. For network visualization, we utilized 
the recently developed Cytoscape Web, which is a client- 
side application implemented in Flex/ActionScript and 
modeled after the popular Cytoscape software (36). To 
prevent that the visualizaton tool becomes unresponsive, 
certain automatic layout and filtering procedures are im- 
plemented for larger networks (Supplementary Materials). 

Information about proteins and interactions can be 
interactively explored in the network graphics, avoiding 
cumbersome comparison with the textual output. For 
instance, clicking on the network nodes provides informa- 
tion about the corresponding protein and links to other 
resources such as GeneCards (37) and GeneMania (38) for 
follow-up study. The displayed network can be exported 
as simple tab-delimited text, and as image, either as a 
PNG or PDF file. 



INTEGRATED TOOLS FOR NETWORK ANALYSIS 

Molecular networks are inherently difficult to analyze and 
interpret. In fact, the sheer number of retrieved inter- 
actions for well-studied proteins (including many kinases 
and receptors) can be overwhelming for users (Figure 2a). 
To help users with these challenges, we have implemented 
several tools for filtering and inspecting networks, as well 
as for mapping and utilizing complementary data and 
information. The application of these tools can be 
customized to different research objectives and can assist 
in the elucidation of network structure and prioritizing 
candidate proteins for follow-up studies. 

FILTERING OF INTERACTIONS AND UTILIZATION 
OF GENOMIC DATA 

Filtering of interactions can be carried out on resource, 
published evidence (i.e. number of PubMed references), 
scale of experiment (i.e. small-scale or large-scale), type 
of derivation (i.e. literature, computational prediction or 
Y2H screens), connectivity (i.e. direct or indirect) and 
interaction (i.e. binary or complex) (Figure 2b). These fil- 
tering options can be tailored to produce more reliable 
and specific networks, e.g. include only interactions 
reported in multiple publications. 
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Select Category 



Term 


Count 


Pvalue 


FDR 


Ratio 


Parkinson's disease 


26 


9.02-13 


1.38-10 


7.1739 


Protein processing in 
endoplasmic reticulum 


28 


1. 05-11 


8.10-10 


5.8039 


Neurotrophin signaling 
pathway 


23 


1.70-10 


8.69E-9 


6.249J 


Colorectal cancer 


IS 


4.66E-9 


1.78E-7 


8.7SSS 


Cell cycle 


21 


7.20E-9 


2.20E-7 


S.4874 


Pathways in cancer 


35 


9.30E-9 


2.37E-7 


3,4850 


ErbB signaling pathway 


17 


1.40E-8 


3.06E-7 


6,7007 


pS3 signaling pathway^ 


IS 


2.25E-8 


4.30E-7 


7.6111 


lubiquitin mediated 
(proteolysis 


21 


3.26E-8 


5.5SE-7 


4,9660 


|MAPK signaling pathway 


30 


4.39E-8 


6.71E-7 


3.6043 


Gap junction 


16 


1.S2E-7 


2.12E-6 


5.9321 



(f) 



Figure 2. Network analysis and visualization tools: (a) original network of the query proteins (GADD45A, SNCA and PARK2); (b) network after 
filtering based on published evidence (i.e. number of PubMed references >3) and with known drug targets highlighted in red; (c) network after 
filtering with gene expression data (GSE20186) (39). Nodes corresponding to genes with a log2 fold change > +0.2 are displayed in red and 
those < —0.2 are shown in green; (d) significantly enriched KEGG pathways listed in a table; (e) filtered network with proteins linked to the 
selected term 'Neurotrophin signaling pathway' highlighted in red; and (f) filtered network with proteins linked to 'Ubiquitin mediated proteolysis' 
highlighted in red. 
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UniHI 7 also stores gene expression in 19 different 
human tissue types derived from the Symatlas (40,41). 
Users can apply this data to highlight or exclude 
proteins (based on a chosen threshold level) to derive 
tissue-specific networks. In addition to using gene expres- 
sion data stored in UniHI 7, users can upload their own 
expression data to filter and examine human molecular 
interaction networks. This feature can be applied to 
detect network proteins, which have distinct expression 
patterns related to physiological processes or diseases. 
Two different types of expression data can be used: 
absolute gene expression, i.e. positive values for transcript 
levels such as detected by Affymetrix GeneChips or RNA- 
Seq and differential gene expression, i.e. changes in ex- 
pression derived from two-color arrays or by subtraction 
of absolute expression measurements. Thresholds for dif- 
ferential expression data can be set as maximum /"-values 
or minimum fold changes (Figure 2c). 

In addition, gene lists, for example, derived from RNAi 
screens or high-throughput assays, can be uploaded and 
utilized for annotation and filtering of interaction 
networks. Together with its capacity to overlay expression 
data, this option makes UniHI 7 an efficient platform for 
network-based analyses in the 'post-genomic' era. 



EXPLORING THE NETWORK: DRUG TARGETS, 
GENE ANNOTATION, PHENOTYPES AND 
DISEASES 

Small molecules (drugs) can influence activity of single 
proteins, alter pathogenic mechanisms and are of crucial 
importance in numerous therapeutic interventions. To fa- 
cilitate identification of known drug targets in the 
retrieved networks, UniHI 7 provides relevant highlight- 
ing and filtering options (Figure 2b) as well as information 
about the drugs and their mechanisms of action. From the 
DrugBank database, information for 4203 drugs targeting 
2139 annotated proteins were altogether, imported into 
UniHI 7 (22) (Supplementary Materials). 

The functional relevance of networks is inherently dif- 
ficult to assess. Hence, we have implemented a user- 
friendly integrated tool, which carries out enrichment 
analyses for molecular functions, biological processes, 
cellular location [as defined by Gene Ontology (42)], 
protein families [as defined by Pfam (43)] and pathways 
[as defined by KEGG (44)] of network proteins 
(Figure 2d). The significance of overrepresentation of 
network proteins in a Gene Ontology category, Pfam 
protein family or KEGG pathway is calculated using the 
hypergeometric test (which is equivalent to the one-tailed 
Fisher's exact test) with the terms from human genome as 
background distribution. For significant terms (i.e. GO 
categories, Pfam families or KEGG pathways), the 
number of included network proteins, the f-value and 
the false discovery rate for enrichment are displayed in a 
table. The associated proteins can easily be highlighted in 
the network graphics by clicking on the corresponding 
term (Figure 2e and f). 

Finally, phenotype information can be assessed for 
network proteins in the new version of UniHI. For 



this purpose, we have integrated gene-phenotype associ- 
ations, curated in Mouse Genome Database (45) 
and mapped to their human orthologs, for several major 
phenotypes such as cardiovascular system or embryogen- 
esis. In addition, we have collected genes linked to aging 
in humans from the GenAge database (46), genes 
associated with cancer from the Cancer Gene Census cata- 
logue (47) and genes linked to human diseases from the 
OMIM database (48) (Supplementary Materials). 
Similarly to the type of analysis describe earlier, the 
UniHI user can assess whether phenotypic associations 
are overrepresented among network proteins, and high- 
light the associated proteins within the network. A help 
page with detailed description of the different tools and 
sample outputs for typical analyses is available on the 
UniHI 7 webpage. 



IMPLEMENTATION 

The architecture of UniHI 7 comprises a database and an 
application layer. The database layer is implemented using 
MySQL, an open source SQL relational database manage- 
ment system. The application layer is implemented using a 
J2EE architecture including, e.g. JDBC to connect to the 
back-end database, DAO for interacting with the database 
and accessing data and JavaServerPages to generate web 
pages. Data retrieval from the database is performed using 
the Hibernate library. The communication between client 
and the application layer is through a Tomcat server. 
To perform enrichment analyses, UniHI 7 connects 
via Rserve (http://www.rforge.net/Rserve/) to the R/ 
Bioconductor software (17). Matching of gene and 
protein identifiers was carried out using information 
from HGNC (35) and applying the g:Convert web tool 
(49). UniHI 7 performs best with small- to medium-sized 
networks (with up to several hundred interactions). For 
larger networks, the visualization and analysis becomes 
increasingly time consuming. 



CONCLUSIONS 

UniHI 7 is intended to serve as a bridge between resources 
for interaction data and more advanced software. 
It provides a user-friendly web-based platform to study 
networks underlying molecular mechanisms in human 
health and disease. Customization allows users to: 
(i) adjust the set of included interactions, (ii) overlay 
retrieved sub-networks with other types of data, (hi) in- 
spect networks for relevant genes, (iv) determine potential 
network functions and (v) associations with phenotypes. 
We hope that UniHI 7 will considerably facilitate 
network-orientated investigations for many researchers, 
especially for those who are new to this field. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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